3  Step 1 - Check Column Consistency

4 Problem description

The idea behind this was simply to bind rows of every spreadsheet and check for warnings and errors. If column types were different errors had to be risen and formats would be subject of warnings.

The first step of checking the spreadsheets consisted in trying to read all of them and binding their rows. By doing this, if the column type was not the same, we then had to set the columns with the same type. Also, by reading each spreadsheet, read_excel may warns us if something is wrong.

5 Problem solving

We defined for each column over the 5 spreadsheets which type all of them were. We populated a table directly in Excel and then we read using the following commands:

Code
col_types <- list()

path <- "support/column_types.xlsx"

sheet_names <- readxl::excel_sheets(path)

for (i in 1:length(sheet_names)) {
  col_types[[sheet_names[i]]] <- readxl::read_excel(path = path, sheet = i)
}

# Showing an example of the output
col_types[1]
$Underpasses
# A tibble: 17 × 2
   Column              Type   
   <chr>               <chr>  
 1 Infrastructure_type text   
 2 Structure_ID        text   
 3 Structure_type      text   
 4 Structure_cell      text   
 5 Structure_shape     text   
 6 Structure_photo     text   
 7 Structure_age       numeric
 8 Structure_height    numeric
 9 Structure_length    numeric
10 Structure_width     numeric
11 Waterbody_width     numeric
12 Latitude            numeric
13 Longitude           numeric
14 UTM Zone            text   
15 X (Easting)         numeric
16 Y (Northing)        numeric
17 Datum               text   

The following process has to be repeated for all the spreadsheets, however here we illustrate using the “Underpasses” spreadsheet. In order to collect the messages provided by R we adapted the read_excel function with a purrr::quietly function, that is more appropriate to show warnings and messages for each file/author. We enchain the customized read_sheet function that provides the full paths of all .xlsx files available to the purrr::quietly function.

In the sequence, we are able to collect the warnings (in this case, under_warns), check and correct them directly on the Excel files. The majority of the errors are misspelled text, decimal markers, date and time separators and so on…

After we zeroing the warnings, we are able to check if R successfully bind the rows from each author. If no warnings or errors are raised, it means that we reached our goal.

Code
#Underpasses ----
source("R/FUNCTIONS.R")

under_q <- purrr::quietly(function(file){
  readxl::read_excel(
    file,
    col_types = col_types[["Underpasses"]]$Type,
    sheet = "Underpasses",
    na = c("NA", "na"),
    col_names = TRUE
  ) |>
    janitor::remove_empty("rows")
})

under_all_outputs <- read_sheet(path = "Excel", results = FALSE) |>
  purrr::map(under_q)

under_warns <- under_all_outputs |>
  purrr::map(\(x) purrr::pluck(x, "warnings")) |>
  purrr::compact()

under_results <- under_all_outputs |>
  purrr::map(\(x) purrr::pluck(x, "result")) |>
  dplyr::bind_rows(.id = "Dataset")

under_results |>
  print(n = 10)
# A tibble: 357 × 20
   Dataset        Infrastructure_type Structure_ID Structure_type Structure_cell
   <chr>          <chr>               <chr>        <chr>          <chr>         
 1 Alberto_Gonza… Carretera           PSF1         Paso de fauna  1             
 2 Alberto_Gonza… Carretera           PSF3         Paso de fauna  3             
 3 Alberto_Gonza… Carretera           PSF5         Paso de fauna  5             
 4 Alberto_Gonza… Carretera           PSF7         Paso de fauna  7             
 5 Alberto_Gonza… Carretera           PSF9         Paso de fauna  9             
 6 Alberto_Gonza… Carretera           PSF11        Paso de fauna  11            
 7 Alberto_Gonza… Carretera           PSF13        Paso de fauna  13            
 8 Alberto_Gonza… Carretera           PSF15        Paso de fauna  15            
 9 Ana_Delciellos Rodovia             PNSB01       Passagem de f… 1             
10 Ana_Delciellos Rodovia             PNSB02       Passagem de f… 1             
# ℹ 347 more rows
# ℹ 15 more variables: Structure_shape <chr>, Structure_photo <chr>,
#   Structure_age <dbl>, Structure_height <dbl>, Structure_length <dbl>,
#   Structure_width <dbl>, Waterbody_width <dbl>, Latitude <dbl>,
#   Longitude <dbl>, `UTM Zone` <chr>, `X (Easting)` <dbl>,
#   `Y (Northing)` <dbl>, Datum <chr>, Structure_lenght <dbl>,
#   structure_length <dbl>